Skip to content

RBAC overhaul#804

Merged
alex-bezek merged 11 commits into
mainfrom
alex/rbac-overhaul
May 6, 2026
Merged

RBAC overhaul#804
alex-bezek merged 11 commits into
mainfrom
alex/rbac-overhaul

Conversation

@alex-bezek
Copy link
Copy Markdown
Collaborator

@alex-bezek alex-bezek commented Apr 9, 2026

related: #K8SOP-246

What

Overhauls RBAC across all three operator deployments (api-manager, agent-manager, bindings-forwarder) to:

  1. Hand-manage all RBAC in Helm templates — removes controller-gen RBAC generation and all 73 kubebuilder:rbac markers from Go code. This follows the pattern used by cert-manager, Kyverno, ArgoCD, and other multi-deployment operators.
  2. Reorganize into per-component directories — each deployment's templates (deployment, SA, RBAC) live in their own directory (api-manager/, agent/, bindings-forwarder/).
  3. Add namespace-scoping support — when watchNamespace is set, namespace-scoped resources use a Role instead of a ClusterRole. Cluster-scoped resources (namespaces, ingressclasses, gatewayclasses) remain in a ClusterRole.
  4. Fix all known RBAC bugs (see below).

How

Bug fixes

  • Delete 4 stale CRD access roles referencing non-existent CRDs (bindingconfiguration, operatorconfiguration)
  • Fix boundendpoint editor/viewer roles — wrong API group (ngrok.k8s.ngrok.combindings.k8s.ngrok.com)
  • Remove dead proxy-role ClusterRole (leftover kube-rbac-proxy scaffolding)
  • Fold secret-manager-role into api-manager role (add secret create/update/patch)
  • Remove stale tunnels rules from agent ClusterRole (Tunnel CRD no longer exists)
  • Remove duplicate events rule from bindings-forwarder
  • Remove misapplied clusterRole.annotations from operator-internal roles

Restructure

  • Move controller-*.yaml templates into templates/api-manager/ directory
  • Split monolithic RBAC files into per-resource files (role.yaml, rolebinding.yaml, etc.)
  • Rename service-account.yamlserviceaccount.yaml for consistency
  • Move CRD access roles into templates/rbac/crd-access/ with consistent naming
  • Add missing CRD access roles for AgentEndpoint, KubernetesOperator, NgrokTrafficPolicy

Go / build changes

  • Remove all 73 +kubebuilder:rbac markers from 18 Go files (comment-only changes)
  • Remove rbac:roleName=... and output:rbac:... from controller-gen command in generate.mk

values.yaml

  • Rename clusterRolecrdAccessRoles (only applies to CRD access editor/viewer roles)
  • Add crdAccessRoles.create toggle

Namespace-scoping

  • New isNamespaced and watchNamespace helpers in _helpers.tpl
  • Each component gets role.yaml (ClusterRole, default) and role-namespaced.yaml (Role, watchNamespace) with simple if/if not guards
  • api-manager splits into: Role (namespace-scoped resources) + ClusterRole (cluster-scoped resources) in watchNamespace mode

Verification

  • RBAC baseline captured from kind cluster before refactor (specs/rbac/)
  • After refactor: redeployed, diffed kubectl auth can-i --list per SA against baseline
  • All intentional diffs documented in design doc

Used a script to dump permissions for all our service accounts across the namespaces default, ngrok-operator where its installed, and test-ns-to-watch which would be sometimes used for watchNamespace. Then used diff to see

  • main vs main-watchNamespace has no diffs as expected
  • main vs branch
    • config maps got patch. while not needed technically, they had update, so this is just a consistency thing
    • removed cluster scoped k8soperator crd access as it only needs it in its own namespace. it still has access in the ngrok-operator namespace though
    • removed old things that aren't used that were skaffolded by kubebuilder like subjectaccessReview and tokenreview
    • secrets got delete access removed as i couldn't find where we used it and seemed worth limiting the scope
  • for branch vs branch-watchNamespace there are lots of diffs removing all the crd access from the individual namespaces but remain in the test ns to watch. bindings is also unaffected

Breaking Changes

  • clusterRole.annotations renamed to crdAccessRoles.annotations in values.yaml. Users setting this value will need to update their Helm values.
  • proxy-role ClusterRole removed (grants tokenreviews and subjectaccessreviews). This was dead scaffolding from kube-rbac-proxy which was never deployed. No functional impact.
  • Bindings-forwarder pods ClusterRole removed — pods access moved to the namespaced Role. Only affects users who relied on the forwarder reading pods across namespaces (it doesn't — it only watches pods in its own namespace).

@github-actions github-actions Bot added area/controller Issues dealing with the controller area/helm-chart Issues dealing with the helm chart size/XXL Denotes a PR that changes 1000+ lines labels Apr 9, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 0% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.77%. Comparing base (962d5d7) to head (7d99cb8).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
cmd/agent-manager.go 0.00% 10 Missing ⚠️
cmd/api-manager.go 0.00% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #804      +/-   ##
==========================================
- Coverage   53.38%   52.77%   -0.62%     
==========================================
  Files         101      101              
  Lines       11325    11337      +12     
==========================================
- Hits         6046     5983      -63     
- Misses       4852     4915      +63     
- Partials      427      439      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Captures the motivation, constraints, and design decisions for the RBAC
overhaul before the implementation changes land.
…r-gen RBAC output

RBAC is now defined explicitly in Helm templates rather than generated
from code annotations. Removes all +kubebuilder:rbac markers from
controllers and drain.go, and drops the rbac output target from
controller-gen so it no longer clobbers the Helm-managed files.
@alex-bezek alex-bezek force-pushed the alex/rbac-overhaul branch 2 times, most recently from 6bf98aa to 7b6ac71 Compare April 29, 2026 19:40
… Helm templates

Replaces the monolithic controller-rbac.yaml and per-component rbac.yaml
files with a consistent per-component directory structure (agent/,
api-manager/, bindings-forwarder/). Each component now owns its own
Role, RoleBinding, and optional namespace-scoped variants.

Key changes:
- agent: split rbac.yaml into role.yaml + rolebinding.yaml with
  optional namespaced variants for namespace-scoped installs
- api-manager: moved from templates/rbac/role.yaml into dedicated
  api-manager/ directory alongside its other templates; adds
  leader-election-role.yaml and namespaced role support
- bindings-forwarder: renamed rbac.yaml -> role.yaml for consistency
- Deleted controller-rbac.yaml (replaced by api-manager/role.yaml)
- Renamed controller-{cm,deployment,pdb,serviceaccount}.yaml into
  api-manager/ directory for cohesion
- Renamed service-account.yaml -> serviceaccount.yaml everywhere
- values.yaml/schema: adds crdAccessRoles and per-component RBAC flags
Moves existing editor/viewer roles into a dedicated rbac/crd-access/
subdirectory with consistent naming, and adds new roles for
NgrokTrafficPolicy (previously missing).

These ClusterRoles are for users of the operator — granting cluster
members read or write access to ngrok CRDs — as opposed to the
operator's own service account permissions.
…tion

Updates all Helm unit tests and snapshots to match the reorganized
template structure (per-component directories, renamed files). Adds
new test suites for api-manager RBAC and crd-access roles.

Also adds a chainsaw e2e test that verifies the operator's service
accounts have exactly the permissions they need — no more, no less.
Regenerates manifest-bundle.yaml and updates the Helm README to
reflect the new values added for per-component RBAC configuration.
@jonstacks jonstacks changed the title Alex/rbac overhaul RBAC overhaul Apr 30, 2026
@alex-bezek alex-bezek marked this pull request as ready for review May 6, 2026 14:14
@alex-bezek alex-bezek requested a review from a team as a code owner May 6, 2026 14:14
@alex-bezek alex-bezek added this pull request to the merge queue May 6, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 6, 2026
@alex-bezek alex-bezek enabled auto-merge May 6, 2026 15:12
@alex-bezek alex-bezek added this pull request to the merge queue May 6, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 6, 2026
@alex-bezek alex-bezek added this pull request to the merge queue May 6, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks May 6, 2026
@alex-bezek alex-bezek enabled auto-merge May 6, 2026 18:00
@alex-bezek alex-bezek added this pull request to the merge queue May 6, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 6, 2026
@alex-bezek alex-bezek added this pull request to the merge queue May 6, 2026
Merged via the queue into main with commit 3187082 May 6, 2026
18 checks passed
@alex-bezek alex-bezek deleted the alex/rbac-overhaul branch May 6, 2026 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/controller Issues dealing with the controller area/helm-chart Issues dealing with the helm chart size/XXL Denotes a PR that changes 1000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants